AARHUS
UNIVERSITY
Department of Biology
Climate Data with the
KrigR Pipeline
EFFICIENT DATA RETRIEVAL AND PROCESSING FOR INDIVIDUAL STUDY REQUIREMENTS
Erik Kusch, PhD Student
Department of Biology
Section for Ecoinformatics & Biodiversity
Center for Biodiversity Dynamics in a Changing World (BIOCHANGE)
Aarhus University
AARHUS
UNIVERSITY
Department of Biology
Data Requirements for the 21st century
Holy Trinity of Climate Data
2
AARHUS
UNIVERSITY
Department of Biology
Contemporary Data - Accuracy
Large areas of legacy data sets suffer from low reliability
3
No data uncertainty/quality markers
Spatial resolutions: 1x1km
Too tough for data sources and
interpolation techniques used for
legacy data
AARHUS
UNIVERSITY
Department of Biology
Contemporary Data Temporal Resolution
Most legacy data sets offer monthly intervals of data
4
Natural processes rarely happen at neat
monthly intervals
Temporal variability must be accounted for
in quantification of extreme events
Biologically
relevant gaps:
© Connor Bernard
Rapid
Growth
Rapid
Decline
Neutral
© Connor Bernard
Extreme
Events:
AARHUS
UNIVERSITY
Department of Biology
Contemporary Data Variables
5
Research
ECVs offered by legacy data sets:
e.g.: WorldClim: ~26 variables
Mostly raw data and derivatives of:
Air Temperature
Precipitation
Data is rarely available at higher
resolutions than monthly intervals
Much of this data is interpolated
Problematic with certain ECVs
Interpolation methods usually not
made publicly available
Legacy data sets offer insufficient coverage of the ECV spectrum
ECVs offered by legacy data sets:
e.g.: WorldClim: ~25 climate variables
19 bioclimatic variables
Mostly raw data and derivatives of:
Air Temperature
Precipitation
Environments
Wind Radiation
AARHUS
UNIVERSITY
Department of Biology
The Holy Trinity & Reformation Needs
6
Come
on, man.
AARHUS
UNIVERSITY
Department of Biology
Climate Reanalyses are the Solution
7
Variables offered by climate reanalyses:
e.g.: ERA5(-Land): ~83 variables
Covering all ECVs indexing important
components of ecosystems such as:
Atmosphere
Soil properties
Resolution:
Space: 9km
Time: hourly intervals
Accuracy:
Good fit to nature due to data
assimilation practices
AARHUS
UNIVERSITY
Department of Biology
Why isn’t everyone using reanalyses?
8
Data Retrieval
Download Prerequisits
-CDS account & API key (generated here)
ecmwfr package
-Very unintuitive download specification
-No processing of data
User Needs:
-More intuitive download specification
-Spatial data limiting beyond extents
-Aggregation to desired temporal resolutions
AARHUS
UNIVERSITY
Department of Biology
Why isn’t everyone using reanalyses?
9
Data Retrieval
Download Prerequisits
-CDS account & API key (generated here)
ecmwfr package
-Very unintuitive download specification
-No processing of data
User Needs:
-More intuitive download specification
-Spatial data limiting beyond extents
-Aggregation to desired temporal resolutions
Spatial Resolution
User Demands:
-1km resolution
Statistical Interpolation:
-Let users define the resolutions they need
-Downscale reanalysis data from native
resolutions on the user-end
AARHUS
UNIVERSITY
Department of Biology
Why isn’t everyone using reanalyses?
10
Data Retrieval
Download Prerequisits
-CDS account & API key (generated here)
ecmwfr package
-Very unintuitive download specification
-No processing of data
User Needs:
-More intuitive download specification
-Spatial data limiting beyond extents
-Aggregation to desired temporal resolutions
KrigR provides a toolbox for all of this!
Spatial Resolution
User Demands:
-1km resolution
Statistical Interpolation:
-Let users define the resolutions they need
-Downscale reanalysis data from native
resolutions on the user-end
AARHUS
UNIVERSITY
Department of Biology
R-internal Workflow 3 Steps
11
Climate Data
Era5(-Land)
Data / Spatial
product
Kriging
Downscaled
Product
Downscaling
Standard Error
&
Covariates
&Covariates
(Target Resolution)
Covariates
(Training Resolution) &
AARHUS
UNIVERSITY
Department of Biology
Climate Data The Function Call
12
Climate Variable
Reanalysis Data Product
Time-Window
Temporal Resolution
Geographical Region
Directory where NETCDF will be stored
Name for NETCDF output
CDS API Credentials
download_ERA() has more arguments.
AARHUS
UNIVERSITY
Department of Biology
Climate Data Shapefiles & Locations
13
download_ERA() can take shapefiles
and point-locations.
AARHUS
UNIVERSITY
Department of Biology
Climate Data Time-Series
14
The FUN argument in
download_ERA() gives you
control over aggregate
metrics.
Setting the TResolution and
TStep arguments in
download_ERA(), you can
achieve any temporal
resolution you want.
AARHUS
UNIVERSITY
Department of Biology
Climate Data Efficient Downloads
15
Downloads of less
than 100,000 layers
of data can be
forced to be staged
as one call to the
CDS with SingularDL.
Downloads can be
sped up by staging
them in parallel
rather than
sequentially using
Cores.
AARHUS
UNIVERSITY
Department of Biology
Climate Data Consideration
16
Climate variables which are provided as cumulative
records can be back-transformed into individual
records by toggling the PrecipFix argument.
PrecipFix
AARHUS
UNIVERSITY
Department of Biology
Covariates The Function Call
17
Data you want to downscale
Target resolution (either a number, or a raster object whose resolution to match)
Optional, shapefiles or points like specified in download_ERA()
Whether to keep the GMTED 2010 data set on your hard drive
KrigR provides USGS GMTED 2010 digital
elevation model data as interpolation
covariates.
AARHUS
UNIVERSITY
Department of Biology
Kriging The Function Call
18
krigR() gives you parallel
processing of multi-layer rasters
and allows for pausing and
restarting kriging via temporary
files.
Data you want to downscale
Covariates at training resolution
Covariates at target resolution
How many cores to use for kriging
Localisation of kriging
Directory where NETCDF will be stored
Name for NETCDF output
Whether to delete the temporary files (1/layer) upon completion
Kriging uncertainty can
help you understand
quality and robustness of
your interpolated data.
AARHUS
UNIVERSITY
Department of Biology
KrigR Aggregate Uncertainty
19
By accessing reanalysis ensembles using
the Type argument in download_ERA(), you can
obtain dynamic uncertainty.
AARHUS
UNIVERSITY
Department of Biology
R-internal Workflow 3 Steps
20
Climate Data
Era5(-Land)
Data / Spatial
product
Kriging
Downscaled
Product
Downscaling
Standard Error
&
Covariates
&Covariates
(Target Resolution)
Covariates
(Training Resolution) &
Third-
party
data
AARHUS
UNIVERSITY
Department of Biology
Kriging Uncertainty
21
1. Statistical interpolation uncertainty () remains
constant across temporal scales.
2. Dynamical uncertainty () of the underlying data
diminishes as time-scales increase.
3. Both sources of uncertainty are important and
should be propagated into downstream analyses.
AARHUS
UNIVERSITY
Department of Biology
Kriging Accuracy
22
Testing Kriging Accuracy:
1. Krig upscaled ERA5-Land data to native resolution
2. Difference of upscaled & interpolated product ()
3. Total uncertainty of kriged product:
 

3. Where Kriging is not the most accurate method, it is the only one
that produces uncertainty estimates.
1. Kriging outperforms most other interpolation methods.
2. Kriging is highly accurate for a variety of ECVs.
AARHUS
UNIVERSITY
Department of Biology
KrigR Products & Legacy Data
23
1. KrigR-products do not align with most legacy products.
2. Particularly, in topographically heterogenous regions,
KrigR seems most reliable (i.e. accurate) and informative
(through provision of uncertainty metrics) to us.
AARHUS
UNIVERSITY
Department of Biology
Using KrigR
24
Downloads ERA5(-Land)
Automatically breaks download request into
sizeable chunks
Aggregates hourly/monthly data to user-
specified temporal resolution and metric
Masks data according to shapefile / locations
Provides GMTED2010 data as covariates
Kriging of climate data
Checks for common errors
Parallelised processing of multi-bands
Exports kriging uncertainty
Temporary Files while kriging with krigR()
Temporary Directory in the directory
indicated by Dir
Individual NETCDF (.nc) files for the kriging
prediction and standard error of each layer
in the input data
Removed automatically upon completion if
Keep_Temporary = FALSE
Why?
You can put kriging on-hold, krigR() checks
for the presence of temporary files for each
run
AARHUS
UNIVERSITY
Department of Biology
The Efficiency Of The KrigR Workflow
25
KrigR offers a flexible, efficient workflow
Data Storage & Downloads
Only download and store the data you need
Reducing storage demands and downloads sizes
Data Handling & Processing
Any temporal resolution & aggregate metric you need
Reliable & informative statistical interpolation via Kriging
Computational Cost
Processing time scales close to exponentially with extent
and downscaling factor
Work on minimally required scales or localised kriging
AARHUS
UNIVERSITY
Department of Biology
The Road Ahead For KrigR
26
What’s still to come?
More efficient downloads
Automatic splitting of download requests into
maximum layer numbers
Support for more ECMWF data sets
ECMWF has voiced development interest
KrigR methodlogy for climate projections &
data products
KrigR-based critique of bioclimatic variables
& development of novel metrics
Use-case studies:
Life-History Trade-offs (ongoing project)
Species distributions (planned project)